14 research outputs found
Simulating Human Gaze with Neural Visual Attention
Existing models of human visual attention are generally unable to incorporate
direct task guidance and therefore cannot model an intent or goal when
exploring a scene. To integrate guidance of any downstream visual task into
attention modeling, we propose the Neural Visual Attention (NeVA) algorithm. To
this end, we impose to neural networks the biological constraint of foveated
vision and train an attention mechanism to generate visual explorations that
maximize the performance with respect to the downstream task. We observe that
biologically constrained neural networks generate human-like scanpaths without
being trained for this objective. Extensive experiments on three common
benchmark datasets show that our method outperforms state-of-the-art
unsupervised human attention models in generating human-like scanpaths
Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
Over the past decade, there has been extensive research aimed at enhancing
the robustness of neural networks, yet this problem remains vastly unsolved.
Here, one major impediment has been the overestimation of the robustness of new
defense approaches due to faulty defense evaluations. Flawed robustness
evaluations necessitate rectifications in subsequent works, dangerously slowing
down the research and providing a false sense of security. In this context, we
will face substantial challenges associated with an impending adversarial arms
race in natural language processing, specifically with closed-source Large
Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic's Claude. We
provide a first set of prerequisites to improve the robustness assessment of
new approaches and reduce the amount of faulty evaluations. Additionally, we
identify embedding space attacks on LLMs as another viable threat model for the
purposes of generating malicious content in open-sourced models. Finally, we
demonstrate on a recently proposed defense that, without LLM-specific best
practices in place, it is easy to overestimate the robustness of a new
approach
Behind the Machine's Gaze: Biologically Constrained Neural Networks Exhibit Human-like Visual Attention
By and large, existing computational models of visual attention tacitly
assume perfect vision and full access to the stimulus and thereby deviate from
foveated biological vision. Moreover, modelling top-down attention is generally
reduced to the integration of semantic features without incorporating the
signal of a high-level visual tasks that have shown to partially guide human
attention. We propose the Neural Visual Attention (NeVA) algorithm to generate
visual scanpaths in a top-down manner. With our method, we explore the ability
of neural networks on which we impose the biological constraints of foveated
vision to generate human-like scanpaths. Thereby, the scanpaths are generated
to maximize the performance with respect to the underlying visual task (i.e.,
classification or reconstruction). Extensive experiments show that the proposed
method outperforms state-of-the-art unsupervised human attention models in
terms of similarity to human scanpaths. Additionally, the flexibility of the
framework allows to quantitatively investigate the role of different tasks in
the generated visual behaviours. Finally, we demonstrate the superiority of the
approach in a novel experiment that investigates the utility of scanpaths in
real-world applications, where imperfect viewing conditions are given
FastAMI -- a Monte Carlo Approach to the Adjustment for Chance in Clustering Comparison Metrics
Clustering is at the very core of machine learning, and its applications
proliferate with the increasing availability of data. However, as datasets
grow, comparing clusterings with an adjustment for chance becomes
computationally difficult, preventing unbiased ground-truth comparisons and
solution selection. We propose FastAMI, a Monte Carlo-based method to
efficiently approximate the Adjusted Mutual Information (AMI) and extend it to
the Standardized Mutual Information (SMI). The approach is compared with the
exact calculation and a recently developed variant of the AMI based on pairwise
permutations, using both synthetic and real data. In contrast to the exact
calculation our method is fast enough to enable these adjusted
information-theoretic comparisons for large datasets while maintaining
considerably more accurate results than the pairwise approach.Comment: Accepted at AAAI 202
CLIP: Cheap Lipschitz Training of Neural Networks
Despite the large success of deep neural networks (DNN) in recent years, most
neural networks still lack mathematical guarantees in terms of stability. For
instance, DNNs are vulnerable to small or even imperceptible input
perturbations, so called adversarial examples, that can cause false
predictions. This instability can have severe consequences in applications
which influence the health and safety of humans, e.g., biomedical imaging or
autonomous driving. While bounding the Lipschitz constant of a neural network
improves stability, most methods rely on restricting the Lipschitz constants of
each layer which gives a poor bound for the actual Lipschitz constant.
In this paper we investigate a variational regularization method named CLIP
for controlling the Lipschitz constant of a neural network, which can easily be
integrated into the training procedure. We mathematically analyze the proposed
model, in particular discussing the impact of the chosen regularization
parameter on the output of the network. Finally, we numerically evaluate our
method on both a nonlinear regression problem and the MNIST and Fashion-MNIST
classification databases, and compare our results with a weight regularization
approach.Comment: 12 pages, 2 figures, accepted at SSVM 202
Contrastive Language-Image Pretrained Models are Zero-Shot Human Scanpath Predictors
Understanding the mechanisms underlying human attention is a fundamental
challenge for both vision science and artificial intelligence. While numerous
computational models of free-viewing have been proposed, less is known about
the mechanisms underlying task-driven image exploration. To address this gap,
we present CapMIT1003, a database of captions and click-contingent image
explorations collected during captioning tasks. CapMIT1003 is based on the same
stimuli from the well-known MIT1003 benchmark, for which eye-tracking data
under free-viewing conditions is available, which offers a promising
opportunity to concurrently study human attention under both tasks. We make
this dataset publicly available to facilitate future research in this field. In
addition, we introduce NevaClip, a novel zero-shot method for predicting visual
scanpaths that combines contrastive language-image pretrained (CLIP) models
with biologically-inspired neural visual attention (NeVA) algorithms. NevaClip
simulates human scanpaths by aligning the representation of the foveated visual
stimulus and the representation of the associated caption, employing
gradient-driven visual exploration to generate scanpaths. Our experimental
results demonstrate that NevaClip outperforms existing unsupervised
computational models of human visual attention in terms of scanpath
plausibility, for both captioning and free-viewing tasks. Furthermore, we show
that conditioning NevaClip with incorrect or misleading captions leads to
random behavior, highlighting the significant impact of caption guidance in the
decision-making process. These findings contribute to a better understanding of
mechanisms that guide human attention and pave the way for more sophisticated
computational approaches to scanpath prediction that can integrate direct
top-down guidance of downstream tasks
System Design for a Data-driven and Explainable Customer Sentiment Monitor
The most important goal of customer services is to keep the customer
satisfied. However, service resources are always limited and must be
prioritized. Therefore, it is important to identify customers who potentially
become unsatisfied and might lead to escalations. Today this prioritization of
customers is often done manually. Data science on IoT data (esp. log data) for
machine health monitoring, as well as analytics on enterprise data for customer
relationship management (CRM) have mainly been researched and applied
independently. In this paper, we present a framework for a data-driven decision
support system which combines IoT and enterprise data to model customer
sentiment. Such decision support systems can help to prioritize customers and
service resources to effectively troubleshoot problems or even avoid them. The
framework is applied in a real-world case study with a major medical device
manufacturer. This includes a fully automated and interpretable machine
learning pipeline designed to meet the requirements defined with domain experts
and end users. The overall framework is currently deployed, learns and
evaluates predictive models from terabytes of IoT and enterprise data to
actively monitor the customer sentiment for a fleet of thousands of high-end
medical devices. Furthermore, we provide an anonymized industrial benchmark
dataset for the research community